Search CORE

A general approach to simultaneous model fitting and variable elimination in response models for biological data with many more variables than observations

Author: A Dempster
A Spira
B Schèolkopf
C Ambroise
DA Hinds
DR Cox
GN Watson
Harri T Kiiveri
HT Kiiveri
I Guyon
JA Nelder
JC Platt
JX Zhu
L Breiman
M Abramowitz
M Figueiredo
M Figueiredo
ME Ross
MY Park
P McCullagh
R Tibshirani
RDC Team
S Kotz
S Zhang
SA Tomlins
SS Dave
SS Keerthi
T Zhang
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background With the advent of high throughput biotechnology data acquisition platforms such as micro arrays, SNP chips and mass spectrometers, data sets with many more variables than observations are now routinely being collected. Finding relationships between response variables of interest and variables in such data sets is an important problem akin to finding needles in a haystack. Whilst methods for a number of response types have been developed a general approach has been lacking. Results The major contribution of this paper is to present a unified methodology which allows many common (statistical) response models to be fitted to such data sets. The class of models includes virtually any model with a linear predictor in it, for example (but not limited to), multiclass logistic regression (classification), generalised linear models (regression) and survival models. A fast algorithm for finding sparse well fitting models is presented. The ideas are illustrated on real data sets with numbers of variables ranging from thousands to millions. R code implementing the ideas is available for download. Conclusion The method described in this paper enables existing work on response models when there are less variables than observations to be leveraged to the situation when there are many more variables than observations. It is a powerful approach to finding parsimonious models for such datasets. The method is capable of handling problems with millions of variables and a large variety of response types within the one framework. The method compares favourably to existing methods such as support vector machines and random forests, but has the advantage of not requiring separate variable selection steps. It is also works for data types which these methods were not designed to handle. The method usually produces very sparse models which make biological interpretation simpler and more focused.</p

Springer - Publisher Connector

Physiological wireless sensor network for the detection of human moods to enhance human-robot interaction

Author: A Burns
AM Isen
B Mali
BW White
DW Aha
F-C Kao
Filippo Cavallo
GI Webb
H-M Wang
J-H Kim
Katharina Lochner
M Bradley
M Chen
N Lippman
RW Picard
S Koelstra
S Kreibig
SS Keerthi
T Tamura
UR Acharya
W Boucsein
William W. Cohen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Florence Research

Feature engineering and a proposed decision-support system for systematic reviewers of medical evidence

Author: A Jimeno-Yepes
AM Cohen
AM Cohen
AM Cohen
BC Wallace
BR Luce
Christian Lovis
Dina Demner-Fushman
DM Blei
Eugene Tseytlin
F Boudin
G Del Fiol
H Bastian
H Kilicoglu
I Mierswa
J Chandler
J Yaffe
KA McKibbon
Kevin J. Mitchell
M Steyvers
MF Porter
NL Wilczynski
NL Wilczynski
NL Wilczynski
O Frunza
Q Zou
QT Zeng
R Klinkenberg
S Matwin
SR Dalal
SS Keerthi
T Bekhuis
T Bekhuis
T Bekhuis
T Hofmann
Tanja Bekhuis
TL Griffiths
X Huang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 27/01/2014
Field of study

Objectives: Evidence-based medicine depends on the timely synthesis of research findings. An important source of synthesized evidence resides in systematic reviews. However, a bottleneck in review production involves dual screening of citations with titles and abstracts to find eligible studies. For this research, we tested the effect of various kinds of textual information (features) on performance of a machine learning classifier. Based on our findings, we propose an automated system to reduce screeing burden, as well as offer quality assurance. Methods: We built a database of citations from 5 systematic reviews that varied with respect to domain, topic, and sponsor. Consensus judgments regarding eligibility were inferred from published reports. We extracted 5 feature sets from citations: alphabetic, alphanumeric +, indexing, features mapped to concepts in systematic reviews, and topic models. To simulate a two-person team, we divided the data into random halves. We optimized the parameters of a Bayesian classifier, then trained and tested models on alternate data halves. Overall, we conducted 50 independent tests. Results: All tests of summary performance (mean F3) surpassed the corresponding baseline, P<0.0001. The ranks for mean F3, precision, and classification error were statistically different across feature sets averaged over reviews; P-values for Friedman's test were .045, .002, and .002, respectively. Differences in ranks for mean recall were not statistically significant. Alphanumeric+ features were associated with best performance; mean reduction in screening burden for this feature type ranged from 88% to 98% for the second pass through citations and from 38% to 48% overall. Conclusions: A computer-assisted, decision support system based on our methods could substantially reduce the burden of screening citations for systematic review teams and solo reviewers. Additionally, such a system could deliver quality assurance both by confirming concordant decisions and by naming studies associated with discordant decisions for further consideration. © 2014 Bekhuis et al

D-Scholarship@Pitt

FigShare

Support Vector Machine Implementations for Classification & Clustering

Author: A Ben-Hur
A Meller
A Meller
AI Khinchine
Anil Yelundur
B Scholkopf
CE Shannon
Charlie McChesney
CJC Burges
CW Hsu
E Jaynes
E Osuna
J Yang
JC Platt
JJ Kasianowicz
JN Kapur
K Crammer
M Akeson
M Diserbo
Matthew Landry
R Durbin
RE Schapire
S Amari
S Amari
S Amari
S Kullback
S Winters-Hilt
S Winters-Hilt
S Winters-Hilt
SM Bezrukov
SM Bezrukov
SS Keerthi
Stephen Winters-Hilt
T Joachims
VN Vapnik
W Vercoutere
W Vercoutere
Y Freund
Y Freund
Y Lee
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: We describe Support Vector Machine (SVM) applications to classification and clustering of channel current data. SVMs are variational-calculus based methods that are constrained to have structural risk minimization (SRM), i.e., they provide noise tolerant solutions for pattern recognition. The SVM approach encapsulates a significant amount of model-fitting information in the choice of its kernel. In work thus far, novel, information-theoretic, kernels have been successfully employed for notably better performance over standard kernels. Currently there are two approaches for implementing multiclass SVMs. One is called external multi-class that arranges several binary classifiers as a decision tree such that they perform a single-class decision making function, with each leaf corresponding to a unique class. The second approach, namely internal-multiclass, involves solving a single optimization problem corresponding to the entire data set (with multiple hyperplanes). RESULTS: Each SVM approach encapsulates a significant amount of model-fitting information in its choice of kernel. In work thus far, novel, information-theoretic, kernels were successfully employed for notably better performance over standard kernels. Two SVM approaches to multiclass discrimination are described: (1) internal multiclass (with a single optimization), and (2) external multiclass (using an optimized decision tree). We describe benefits of the internal-SVM approach, along with further refinements to the internal-multiclass SVM algorithms that offer significant improvement in training time without sacrificing accuracy. In situations where the data isn't clearly separable, making for poor discrimination, signal clustering is used to provide robust and useful information – to this end, novel, SVM-based clustering methods are also described. As with the classification, there are Internal and External SVM Clustering algorithms, both of which are briefly described

Springer - Publisher Connector

Public Library of Science (PLOS)

Using Expression and Genotype to Predict Drug Response in Yeast

Author: A Ooyama
CM Harford
David C. Roberts
Douglas M. Ruderfer
E Marrer
EO Perlstein
EO Perlstein
Ethan O. Perlstein
HC Kang
HS Kim
HW Lo
I Guyon
I Ifergan
IH Witten
J Platt
JE Staunton
JJ Swen
K Fujita
KM O'Shaughnessy
L Quintieri
Leonid Kruglyak
LM Baudhuin
LW Chinn
MH Court
R Benz
RB Brem
RB Brem
RS Huang
S Dan
SF Grant
SS Keerthi
Stuart L. Schreiber
T Hastie
T Lynch
Thomas Preiss
TJ Lynch
U Christians
UT Shankavaram
WA Weber
Y Huang
Y Ma
Publication venue: Public Library of Science
Publication date: 01/09/2009
Field of study

Personalized, or genomic, medicine entails tailoring pharmacological therapies according to individual genetic variation at genomic loci encoding proteins in drug-response pathways. It has been previously shown that steady-state mRNA expression can be used to predict the drug response (i.e., sensitivity or resistance) of non-genotyped mammalian cancer cell lines to chemotherapeutic agents. In a real-world setting, clinicians would have access to both steady-state expression levels of patient tissue(s) and a patient's genotypic profile, and yet the predictive power of transcripts versus markers is not well understood. We have previously shown that a collection of genotyped and expression-profiled yeast strains can provide a model for personalized medicine. Here we compare the predictive power of 6,229 steady-state mRNA transcript levels and 2,894 genotyped markers using a pattern recognition algorithm. We were able to predict with over 70% accuracy the drug sensitivity of 104 individual genotyped yeast strains derived from a cross between a laboratory strain and a wild isolate. We observe that, independently of drug mechanism of action, both transcripts and markers can accurately predict drug response. Marker-based prediction is usually more accurate than transcript-based prediction, likely reflecting the genetic determination of gene expression in this cross

Online Research @ Cardiff

Harvard University - DASH

eScholarship - University of California

Online Learning for 3D LiDAR-based Human Detection: Experimental Analysis of Point Cloud Clustering and Classification Methods

Author: A Teichman
BJ Mohler
C Cortes
CC Chang
J Dequaire
K Li
L Beyer
L Sun
M Everingham
M Munaro
N Bellotto
N Bellotto
Nicola Bellotto
S Bianco
SJ Julier
SS Keerthi
T Krajnik
Tom Duckett
XR Li
Z Kalal
Zhi Yan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

This paper presents a system for online learning of human classifiers by mobile service robots using 3D~LiDAR sensors, and its experimental evaluation in a large indoor public space. The learning framework requires a minimal set of labelled samples (e.g. one or several samples) to initialise a classifier. The classifier is then retrained iteratively during operation of the robot. New training samples are generated automatically using multi-target tracking and a pair of "experts" to estimate false negatives and false positives. Both classification and tracking utilise an efficient real-time clustering algorithm for segmentation of 3D point cloud data. We also introduce a new feature to improve human classification in sparse, long-range point clouds. We provide an extensive evaluation of our the framework using a 3D LiDAR dataset of people moving in a large indoor public space, which is made available to the research community. The experiments demonstrate the influence of the system components and improved classification of humans compared to the state-of-the-art

University of Lincoln Institutional Repository

Archivio istituzionale della ricerca - Università di Padova

Combined artificial bee colony algorithm and machine learning techniques for prediction of online consumer repurchase intention

Author: A Bilgihan
A Field
A Kumar
A Kumar
A Kumar
A Kumar
A Liaw
A Merle
A Varma Citrin
ACC Lu
Anil Kumar
B Akay
BF Blake
C Herbes
C Hsing Wu
C Kim
C Rygielski
CA Kochukalam
CH Jin
CL Hsu
CS Ling
D Karaboga
D Karaboga
DH Park
EJ Johnson
EJ Lee
Eswara Krishna Mussada
G Wagner
Gaurav Kabra
GD Pires
H He
H Heijden Van der
HH Lee
HW Kim
I Colantone
I Küster
IP Riquelme
J Gao
JC Nunnally
JCF Caballero
JF Hair
JL Seng
JM Field
JR Quinlan
JY Lai
K McCullough Johnston
KC Lee
KO Cowart
L Mahdjoubi
LF Bright
LP Forbes
M Li
M Rizwan
M Sicilia
Manoj Kumar Dash
ME Nissen
MH Aghdaie
MH Aghdaie
MH Yang
MO Rieger
MY Lee
N Azad
N Duch-Brown
N Pappas
O Merlo
PJ Danaher
Prashant Singh Rana
Q Zhou
R Maniak
R Mohanty
RE Goldsmith
RM Al-dweeri
S Azizi
S Lysonski
S Thirumalai
SH Zolfani
SP Jeng
SP Jeng
SS Keerthi
T Hastie
T Verhagen
TL Childers
TY Chan
Y Wang
YA Park
YC Chen
YH Hung
YJ Kang
YK Kim
Z Soltani
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

A novel paradigm in the service sector i.e. services through the web is a progressive mechanism for rendering offerings over diverse environments. Internet provides huge opportunities for companies to provide personalized online services to their customers. But prompt novel web services introduction may unfavorably affect the quality and user gratification. Subsequently, prediction of the consumer intention is of supreme importance in selecting the web services for an application. The aim of study is to predict online consumer repurchase intention and to achieve this objective a hybrid approach which a combination of machine learning techniques and Artificial Bee Colony (ABC) algorithm has been used. The study is divided into three phases. Initially, shopping mall and consumer characteristic’s for repurchase intention has been identified through extensive literature review. Secondly, ABC has been used to determine the feature selection of consumers’ characteristics and shopping malls’ attributes (with > 0.1 threshold value) for the prediction model. Finally, validation using K-fold cross has been employed to measure the best classification model robustness. The classification models viz., Decision Trees (C5.0), AdaBoost, Random Forest (RF), Support Vector Machine (SVM) and Neural Network (NN), are utilized for prediction of consumer purchase intention. Performance evaluation of identified models on training-testing partitions (70-30%) of the data set, shows that AdaBoost method outperforms other classification models with sensitivity and accuracy of 0.95 and 97.58% respectively, on testing data set. This study is a revolutionary attempt that considers both, shopping mall and consumer characteristics in examine the consumer purchase intention.N/

UDORA - University of Derby Online Research Archive

Multi technique amalgamation for enhanced information identification with content based image data

Author: A Raventós
BM Mehtre
D Zhang
D Zhang
F Mokhtarian
HB Kekre
HB Kekre
J Li
J Yue
M Banerjee
M Flickner
M Subrahmanyam
M Subrahmanyam
ME ElAlami
ME ElAlami
N Otsu
PS Hiremath
Q Zhu
RM Madireddy
S Sridhar
S Thepade
SH Shaikh
SS Keerthi
T Gevers
WY Kim
X Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Classification and Analysis of Regulatory Pathways Using Graph Property, Biochemical and Physicochemical Property, and Functional Property

Author: A Bairoch
A Barabasi
C Chen
C Chen
C Klukas
C Krieger
Cathal Seoighe
CF Gao
D Chakrabarti
D Frishman
DN Georgiou
E Camon
F Chiti
G Pollastri
GF Cooper
GP Zhou
GP Zhou
GY Zhang
H Ding
H Lin
H Mohabatkar
H Mohabatkar
H Ogata
H Peng
I Althaus
I Althaus
I Althaus
I Dubchak
I Dubchak
I Schomburg
I Schomburg
IH Witten
J Andraos
J Cheng
J Cheng
JD Qiu
JM Dale
K Chou
K Chou
K Chou
K Chou
K Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
Kuo-Chen Chou
L Chen
L Chen
L Chen
L Chen
L Chen
L Lu
L Lu
L Yu
Lei Chen
M Chang
M Esmaeili
M Kanehisa
M Kanehisa
M Kanehisa
M Kanehisa
N Chazal
N Friedman
P Carmona-Saez
P Pharkya
Q Gu
R Caspi
R Caspi
RR Bouckaert
S Salzberg
SS Keerthi
T Denoeux
T Huang
T Huang
T Huang
T Huang
T Huang
Tao Huang
U Stelzl
W Buntine
X Xiao
XB Zhou
Y Cai
Y Cai
Y Cai
Y Qi
YH Zeng
YS Lobanova
Yu-Dong Cai
Z He
ZC Wu
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Given a regulatory pathway system consisting of a set of proteins, can we predict which pathway class it belongs to? Such a problem is closely related to the biological function of the pathway in cells and hence is quite fundamental and essential in systems biology and proteomics. This is also an extremely difficult and challenging problem due to its complexity. To address this problem, a novel approach was developed that can be used to predict query pathways among the following six functional categories: (i) “Metabolism”, (ii) “Genetic Information Processing”, (iii) “Environmental Information Processing”, (iv) “Cellular Processes”, (v) “Organismal Systems”, and (vi) “Human Diseases”. The prediction method was established trough the following procedures: (i) according to the general form of pseudo amino acid composition (PseAAC), each of the pathways concerned is formulated as a 5570-D (dimensional) vector; (ii) each of components in the 5570-D vector was derived by a series of feature extractions from the pathway system according to its graphic property, biochemical and physicochemical property, as well as functional property; (iii) the minimum redundancy maximum relevance (mRMR) method was adopted to operate the prediction. A cross-validation by the jackknife test on a benchmark dataset consisting of 146 regulatory pathways indicated that an overall success rate of 78.8% was achieved by our method in identifying query pathways among the above six classes, indicating the outcome is quite promising and encouraging. To the best of our knowledge, the current study represents the first effort in attempting to identity the type of a pathway system or its biological function. It is anticipated that our report may stimulate a series of follow-up investigations in this new and challenging area

CiteSeerX